-
Notifications
You must be signed in to change notification settings - Fork 688
fix: do not set default for resources #2471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tedzhouhk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support this change, but better to have k8s experts to review the code.
WalkthroughThe changes adjust container resource specifications across Dynamo components: Frontend and Planner drop CPU/memory limits; Worker retains only a GPU limit; and tests are updated accordingly. The LeaderWorkerSet controller test updates expected Pod resources to include CPU, memory, and GPU limits. No public APIs or control flow are changed. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.2.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
deploy/cloud/operator/internal/dynamo/component_worker.go (1)
45-45: Nit: Use a typed ResourceName for the GPU keyMinor readability/consistency improvement and avoids any ambiguity around typed keys in ResourceList.
Apply this diff:
- "nvidia.com/gpu": resource.MustParse("1"), + corev1.ResourceName("nvidia.com/gpu"): resource.MustParse("1"),deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (1)
708-711: Add a test for the default case (no CPU/mem limits in spec)To guard the new default, add a case where spec.limits omits CPU/memory and assert the generated Pod’s container.Resources.Limits includes only GPU (and no CPU/memory). Also verify the /dev/shm EmptyDir size handling when no memory limit is present (should either be unset or follow the intended fallback).
Here’s a minimal example you can adapt:
func Test_generateLeaderWorkerSet_defaults_NoCpuMemLimits(t *testing.T) { g := gomega.NewWithT(t) s := scheme.Scheme _ = v1alpha1.AddToScheme(s) _ = corev1.AddToScheme(s) _ = leaderworkersetv1.AddToScheme(s) dcd := &v1alpha1.DynamoComponentDeployment{ ObjectMeta: metav1.ObjectMeta{Name: "default-lws", Namespace: "default"}, Spec: v1alpha1.DynamoComponentDeploymentSpec{ DynamoComponent: "comp", DynamoTag: "tag", DynamoComponentDeploymentSharedSpec: v1alpha1.DynamoComponentDeploymentSharedSpec{ ComponentType: string(commonconsts.ComponentTypeWorker), ServiceName: "svc", DynamoNamespace: ptr.To("default"), Multinode: &v1alpha1.MultinodeSpec{NodeCount: 2}, Resources: &common.Resources{ Requests: &common.ResourceItem{CPU: "300m", Memory: "500Mi"}, Limits: &common.ResourceItem{GPU: "1"}, // no CPU/memory limits }, ExtraPodSpec: &dynamoCommon.ExtraPodSpec{ MainContainer: &corev1.Container{Image: "test:latest"}, }, }, }, } fakeClient := fake.NewClientBuilder().WithScheme(s).WithObjects(dcd).Build() r := &DynamoComponentDeploymentReconciler{ Client: fakeClient, Recorder: record.NewFakeRecorder(10), } got, _, err := r.generateLeaderWorkerSet(context.Background(), generateResourceOption{ dynamoComponentDeployment: dcd, instanceID: ptr.To(0), }) g.Expect(err).ToNot(gomega.HaveOccurred()) // Ensure only GPU is limited; CPU/memory limits are absent limits := got.Spec.LeaderWorkerTemplate.LeaderTemplate.Spec.Containers[0].Resources.Limits g.Expect(limits).To(gomega.HaveKeyWithValue(corev1.ResourceName("nvidia.com/gpu"), resource.MustParse("1"))) g.Expect(limits).ToNot(gomega.HaveKey(corev1.ResourceCPU)) g.Expect(limits).ToNot(gomega.HaveKey(corev1.ResourceMemory)) }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go(1 hunks)deploy/cloud/operator/internal/dynamo/component_frontend.go(0 hunks)deploy/cloud/operator/internal/dynamo/component_planner.go(0 hunks)deploy/cloud/operator/internal/dynamo/component_planner_test.go(0 hunks)deploy/cloud/operator/internal/dynamo/component_worker.go(1 hunks)
💤 Files with no reviewable changes (3)
- deploy/cloud/operator/internal/dynamo/component_planner_test.go
- deploy/cloud/operator/internal/dynamo/component_frontend.go
- deploy/cloud/operator/internal/dynamo/component_planner.go
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
deploy/cloud/operator/internal/dynamo/component_worker.go (2)
45-45: LGTM: CPU/memory limits dropped; GPU limit retainedThis aligns with the PR objective (“do not set cpu/mem limit by default”) while keeping the GPU limit.
45-45: GPU requests for nvidia.com/gpu rely on API defaulting
We don’t explicitly set Requests[nvidia.com/gpu] in component_worker.go (line 45), so the Kubernetes API must default requests to match limits for extended resources. Confirm that your target clusters support this behavior (Kubernetes ≥1.8). If not, mirror the GPU limit into requests in code to avoid unschedulable pods.• component_worker.go:45 – only “nvidia.com/gpu” is set under Limits, no matching Requests.
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (1)
708-711: LGTM: Test now expects CPU/memory/GPU limits when specified in the specGood adjustment to validate explicit limits provided via DynamoComponentDeploymentSpec.
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Overview:
do not set resources by default
Summary by CodeRabbit
New Features
Tests